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Abstract 

The approximation of tensors is important for the efficient numerical treatment of high dimensional 
problems, but it remains an extremely challenging task. One of the most popular approach to tensor approx¬ 
imation is the alternating least squares method. In our study, the convergence of the alternating least squares 
algorithm is considered. The analysis is done for arbitrary tensor format representations and based on the 
multiliearity of the tensor format. In tensor format representation techniques, tensors are approximated by 
multilinear combinations of objects lower dimensionality. The resulting reduction of dimensionality not 
only reduces the amount of required storage but also the computational effort. 

Keywords: tensor format, tensor representation, tensor network, alternating least squares optimisation, or¬ 
thogonal projection method. 

MSC: 15A69, 49M20, 65K05, 68W25, 90C26. 

1 Introduction 


During the last years, tensor format representation techniques were successfully applied to the solution of 
high-dimensional problems like stochastic and parametric partial differential equations ||6j [H] [141 HOI |24l [^ 
l27l . With standard techniques it is impossible to store all entries of the discretised high-dimensional objects 
explicitly. The reason is that the computational complexity and the storage cost are growing exponentially 
with the number of dimensions. Besides of the storage one should also solve this high-dimensional problems 
in a reasonable (e.g. linear) time and obtain a solution in some compressed (low-rank/sparse) tensor formats. 
Among other prominent problems, the efficient solving of linear systems is one of the most important tasks in 
scientific computing. 

We consider a minimisation problem on the tensor space V = 0Li equipped with the Euclidean inner 
product (•,■). The objective function / : V —> R of the optimisation task is quadratic 


f{v) 



-{Av,v) - {b,v) , 


( 1 ) 


where A € is a positive definite matrix {A > 0, = A) and b € V. A tensor u G V is 

represented in a tensor format. A tensor format U : Pi x ■ • • x > V is a multilinear map from the cartesian 
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product of parameter spaces Pi,... ,Pl into the tensor space V. A L-tuple of vectors {pi,... ,pl) € P := 
Pi X ■■■ X Pl is called a representation system of u if tt = U{pi,... ,pl)- The precise definition of tensor for¬ 
mat representations is given in Section|2l The solution A~^h = argmin„gy/(r;) is approximated by elements 
from the range set of the tensor format U, i.e. we are looking for a representation system (p|,... ,p2) € P 
such that for 


F := foU:P^V^R 


( 2 ) 


F{Pi,...,Pl) = 


1 


{AU{pi,...,pl),U{pi,...,pl)) - {b,U{pi,... ,pl)) 


we have 

F{pi,...,Pl)- 

{pi,-,PL)eP 

The alternating least squares (ALS) algorithm lU [3l [TOl [HI |2T] |3T1 is iteratively defined. Suppose that the 

k-th iterate p^ = (p^,... ,p\) and the first /r — 1 components Pi~^^,... ,P^t.i of the {k + l)-th iterate p^~^^ 
have been determined. The basic step of the ALS algorithm is to compute the minimum norm solution 


:= argming^ep^F(p5^+\ 


■■,pI-1,Qp,pUi^---’Pl)- 


Thus, in order to obtain from p^, we have to solve successively L ordinary least squares problems. 

The ALS algorithm is a nonlinear Gauss-Seidel method. The local convergence of the nonlinear Gauss-Seidel 
method to a stationary point p* e P follows from the convergence of the linear Gauss-Seidel method applied 
to the Hessian F"{p*) at the limit point p*. If the linear Gauss-Seidel method converges R-linear then there 
exists a neighbourhood B{p*) of p* such that for every initial guess p^ G B{p*) the nonlinear Gauss-Seidel 
method converges R-linear with the same rate as the linear Gauss-Seidel method. We refer the reader to 
Ortega and Rheinboldt for a description of nonlinear Gauss-Seidel method ll^ Section 7.4] and convergence 
analysis Ii28l Thm. 10.3.5, Thm. 10.3.4, and Thm. 10.1.3]. A representation system of a represented tensor is 
not unique, since the tensor representation U is multilinear. Consequently, the matrix F''{p*) is not positive 
definite. Therefore, convergence of the linear Gauss-Seidel method is in general not ensured. However, if the 
Hessian matrix at p* is positive semidefinite then the linear Gauss-Seidel method still converges for sequences 
orthogonal to the kernel of F"{p*), see e.g. |[T^l2^ . Under useful assumptions on the null space of F"{p*), 
Uschmajew et al. showed local convergence of the ALS method. These assumptions are related to the 

nonuniqueness of a representation system and meaningful in the context of a nonlinear Gauss Seidel method. 
However, for tensor format representations the assumptions are not true in general, see the counterexample of 
Mohlenkamp 1251 Section 2.5] and discussion in ll36l Section 3.4]. 

The current analysis is not based on the mathematical techniques developed for the nonlinear Gauss-Seidel 
method, but on the multilinearity of the tensor representation U. This fact is in contrast to previous works. The 
present article is partially related to the study by Mohlenkamp 1251 . For example, the statement of Lemma 
14.14l is already described for the canonical tensor format. 

Section |2] contains a unified mathematical description of tensor formats. The relation between an orthogonal 
projection method and the ALS algorithm is explained in Section |3] The convergence of the ALS method is 
analysed in Section jH where we consider global convergence. Further, the rate of convergence is described 
in detail and explicit examples for all kind of convergent rates are given. The ALS method can converge for 
all tensor formats of practical interest sublinearly, Q-linearly, and even Q-superlinearl>0. We illustrate our 
theoretical results on numerical examples in Section [5] 

*We refer the reader to 1281 for details concerning convergence speed. 
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2 Unified Description of Tensor Format Representations 


A tensor format representation for tensors in V is described by a parameter space P = X and a 

multilinear map U : P ^ V from the parameter space into the tensor space. For the numerical treatment of 
high dimensional problems by means of tensor formats it is essential to distinguish between a tensor u € V 
and a representation system p ^ P of u, where u = U{p). The data size of a representation system is 
often proportional to d. Thanks to the multilinearity of U, the numerical cost of standard operations like 
matrix vector multiplication, addition, and computation of scalar products is also proportional to d, see e.g. 

||ia[l5l[l7l[3S[3l. 

Notation 2.1 (1N„). The set 1N„ of natural numbers smaller than n € IN A denoted by 


;= {j G IN : 1 < j < n}. 


Definition 2.2 (Parameter Space, Tensor Format Representation, Representation System). Let L > d, p ^ IN^^, 
and Pfj, a finite dimensional vector spaces equipped with an inner product The parameter space P is 

the following cartesian product 

L 

P -= X Pn- (3) 

A multilinear map U from the parameter space P into the tensor space V is called a tensor format representa¬ 
tion 

L d 

t/: X (4) 

^=1 v=i 

We say u £V is represented in the tensor format representation U ifu G rangeC/. A tuple (pi,... ,pl) G P is 
called a representation system of u if u = U{pi,... ,pl). 

Remark 2.3. Due to the multilinearity ofU, a representation system of a given tensor u G range(?7) is not 
uniquely determined. 

Example 2.4. For the canonical tensor format representation with r-terms we have L = d and P^ = 

The canonical tensor format representation with r-terms is the following multilinear map 

d 

UcF : X IK-™'''''' ^ V 

11=1 

r d 

{pi,...,Pd)v^UcF{pi,---,Pd) := E® 

j=i fi=i 

where p^j denotes the j-th column of the matrix p^ G For recent algorithms in the canonical tensor 

format we refer to (0 ID H] [77] UTS . 


The tensor train (TT) format representation discussed in t[30\l is for d = 3 and representation ranks ri, r2 G IN 
defined by the multilinear map 

UtT ■■ X X R”^3Xr2 ^ ^ ^ 

ri r2 

{PUP2,Ps) ^ Utt{PuP2,Ps) ■ = EE Pl,i ® P2,i,j ® Pti,j ■ 

i=l j=l 
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3 Orthogonal Projection Method and Alternating Least Squares Algo¬ 
rithm 

It is shown in the following that the ALS algorithm is an orthogonal projection method on subspaces of V = 
. For a better understanding, we briefly repeat the description of projection methods, see e.g. ll4llMl 
for a detailed description. 

An orthogonal projection method for solving the linear system Av = 6 is defined by means of a sequence 
(/CfcjfcgiN of subspaces of V and the construction of a sequence {vk)k€JN C V such that 

Vk+i G fCk and r^+i = b- Avk+i ± JCk- 

A prototype of projection method is explained in Algorithm [T] 


Algorithm 1 Prototype Projection Method 
1: while Stop Condition do 

2; Compute an orthonormal basis 14 = [ui, ■ ■ ■, of K-k 
3; rk\=b- Avk 

4: Vk+i = Vk + Vk{V^AVk)-^V^rk 

5: k + 1 

6: end while 

Notation 3.1 {L{A, B)), Let A, B be two arbitrary vector spaces. The vector space of linear maps from A to 
B is denoted by 

L{A, B) := {if : A ^ B : if is linear} . 

In the following, let [/ : P —> V be a tensor format representation, see Definition 12.21 We need to 
define subspaces of V in order to show that the ALS algorithm is an orthogonal projection method. 
The multilinearity of U and the special form of the ALS micro-step are important for the definition of 
these subspaces. Let /r G IN/, and v G V be a tensor represented in the tensor format U, i.e. there is 
(pi,... ... ,pl) G P such that v = U{pi,... ... ,pl). Since the tensor for¬ 

mat representation U is multilinear we can define a linear map 114(pi,... ,p^_i,p^+i,... ,pl) G L(P^, V) 
such that V = 114(pi,... ,P;i_i,p^+i,... ,Pl)Pii- The map Wf^ depends multilinearly on the parameter 
pi,... ,p^_i,p/,+i,... ,PL. The linear subspace range (114(pi,... ,P;,_i,P;,+i, • • • ,Pl)) 4 V is of great 
importance for the ALS method. For the rest of the article, we identify linear maps with its canonical matrix 
representation. 

Definition 3.2. Let p G IN^. ITe write for a given representation system p = (pi, • • •, Pl) G P 

pt'^l := (pi,...,p^_i,P/,+i,...pl) 

and define 


■ Pfi ^ V ( 5 ) 

■= U{pi,...,p^-i,Pf„Pi,+i,...pL). 

We simply write Wfj,for i.e. Wf^ := if it is clear from the context which representation system 

is considered. 

Proposition 3.3. Let p G M/ and p = (pi,... , pl ) G P. The following holds: 

(i) 114 pM a linear map and range (II4 pM ) a linear subspace ofV. 
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(ii) We have rank < dim(P^). 

(Hi) range C range(t7), i.e. for all v G range (w^ there exist G such that 

V = U{pi,... ,Pi,-i,Pf„Pf,+i,... ,pl). 

(iv) Set Hfj, := W'^ uiVK m G j^dimP^xdimP^ _ iJ^D^ijT }jg the diagonalisation of the 

square matrix H^, where = diag(5i,/,)i=i^...^dimP^ with > 82,^1 > ■ > (5dimP^,/x- Define 

further Dp_ = diag(5j^^)j^^ rankw V columns of 

v^.-.= Wp^py,,u^b-'^ (6) 

form an orthonormal basis of range (w^ p[fj,]^ and 

V'/xP;, = = U{pi,... ,Pf,-i,Uf,D^^Pf„pp+i,... ,pl). ( 7 ) 


(v) The map 


Wfi : Pi X ■■■ X Pfj,-i X Pp+i ... ,Pl —)• L{Pfj,, V) 

Pn ^ 

is multilinear. 


Proof Note that p[^] is linear, since the tensor format U is multilinear. The rest of the assertions follows 
after short calculations, where the last assertion (v) is a direct consequence of the multilinearity of U. ■ 

Remark 3.4. In chemistry the definition ofV^ in Proposition 13.31 (iv) is often called Lowdin transformation, 
see 05] Section 3.4.5], Nevertheless, the construction can be found in several proofs for the existence of the 
singular value decomposition, see e.g. KT6\ Lemma 2.19]. 

Definition 3.5. Let p G Ml, p = (pi,... ,pl) G P, and F : P as defined in Eq. ([2]). We define 

■ Pp ^ ^ (8) 

Pn ^ ^fi,pMiPn) ■= P{Pl^---,Pp-l:Pp,Pp+l:---,PL)- 

We write for convenience := p[)V\ if it is clear from the context which representation system is considered. 

Lemma 3.6. Let p G Ml andp = (pi,... ,pl) G P. We have 

(i) F;(^p) = -TyJ(6-^iypqp), 

(ii) {V^AV^)-^V^h = argming^gp^Pp(qp), 
where is defined in Eq. (16|). 


Proof (i): Let G P^. We have f{W^qb) = Fp{qp) for all p G Ml and 


Fpiqp) = 


{AW^q^,W^qb - {b,W^qp) 


Uw^AW^q,,q^)-{W^b,q^) 
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Since is symmetric, we have F'^{q^) = AW^q^ — W^b = —Wj[{b — AW^q^). 

(ii): For V^q^ G range(FF^) we can write 




\ q^) 


Since is a basis of range(VF^), we have that AV^ is positive definite and therefore 


p; = argmin^^gp^F^(g^) AV,p; - ^^6 = 0 = (V^AV.y^V^b. 


Theorem 3.7, Let /r € andp = (pi,... ,pl) G P- have 

p* = argming^gp^F^(g^) b-A V^^p^ ± range FF^, 

where is from Eq. ©■ 

Proof Follows from Lemma 1X61 and orthogonal projection theorem. ■ 


Algorithm 2 Alternating Least Squares (ALS) Algorithm 
1: Set k -.= 1 and choose an initial guess p^ = (p}, • ■ • ,p\) G P, p^ ^ := p^, and vi := U{p^). 

2: while Stop Condition do 
3: Vk,o-=Vk 

4: for 1 < p < L do 

5: Compute an orthonormal basis 14^ of the range space of lL4p := W [;i] , see e.g. Eq. (l5]l and ®. 






UK,D^;i{Vl^AVk,,)-^D,lUl^ W;,{pt^f ... ,p^li,p^+i,... ,pi)6 (9) 


\-ln 2fTT 


fc+1 


^l,-= 


\Pl !■■■•> l iP/l iP^+ll ■ ■ ■ iPl) 

b - Avk, fj, 

Vk,, + = VkAVk,>.AVk,,)-^V^^^b = 


where Uk^^Dj^ ^ is from Eq. ® 

7: end for 

8^ Pk+l-=Pk,L^^^^’^+P-=^(Pk+l^ 

9; k k + 1 

10 : end while 


Remark 3.8. From the definition ofp^^^, it follows directly that p^^^ ± kemel(W^^fc) and is the vector 
with smallest norm that fulfils the normal equation Gk,^p^^ = W^^b. This is very important for the con¬ 
vergence analysis of the ALS method and we like to point out that our results are based on this condition. We 
must give special attention in a correct implementation of an ALS micro-step in order to fulfil this essential 
property. 
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Figure 1: Graphical illustration of an ALS micro-step for the case when A = id. At the current iteration step, 
we define the linear map € L{P^,V) by means of Vk,fj. and the multilinearity of U, cf. Definition 13.21 
The successor Vk,f_i+i is then the best approximation of b on the subspace range {Wk,fj.) G V. 

4 Convergence Analysis 


We consider global convergence of the ALS method. The convergence analysis for an arbitrary tensor format 
representation U : X ^ V is a quite challenging task. The objective function F from Eq. Q is 

highly nonlinear. Even the existence of a minimum is in general not ensured, see [5i| and |[22]| . We need 
further assumptions on the sequence from the AES method. In order to justify our assumptions, let us study an 
example from Eim and de Silva where it is shown that the tensor 

6 = x0x(8)?/ + x(8)2/®x + 7/(8ix0x 


with tensor rank 3 has no best tensor rank 2 approximation. Eim and de Silva explained this by constructing a 
sequence of rank 2 tensors with 


Vk= {x + -y 


X + - y] 0 {kx + y) 


The linear map Wi^k from Definition |3]2] and the first component 
following form: 


— X® X ®kx ->■ b. 

fc—>-oo 

vector Pi of the parameter system have the 
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(8> IdRn, 


Wi,k = 


p1 = 


1 

X + -y 


X + —y ) , X 0 X 


column vectors of the matrix . 

(g) kx 


J j (^ikx + y)+ ( J 


It is easy to verify that the equation Wi^kPi = Vk holds. Furthermore, we have 

lim IIp^^II = oo, 

k^oo 

Wi = lim VFi k = ( X 0 X x0x)(8) IdRn. 

k^oo ’ ^ ' 

Obviously, the rank of W\ is equal to n but rank(kFi fc) = 2n for all /c G IN. This example shows already 
that we need assumptions on the boundedness of the parameter system and on the dimension of the subspace 
span(lT^,fc)- 

Definition 4.1 (Critical Points). The set Tl of critical points is defined by 

5JI := {x G V : 3p £ P : v = U{p) A F'{p) = O} . (10) 


In our context, critical points are tensors that can be represented in our tensor format U and there exists a 
parameter system p such that {foU)'{p) = 0, i.e. p is a stationary point of F = foU. A representation system 
of a tensor v = U{p) is never uniquely defined since the tensor format is a multilinear map. The following 
remark shows that the non uniqueness of a parameter system has even more subtle effects, in particular when 
the parameter system of v = U (p) is also a stationary point of F. 

Remark 4.2. In general, x G HR does not imply F'{p) = f) for any parameter system p of v, i.e. there exist a 
tensor format U and two different p, p £ P such that U{p) = U{p) and 0 = F'{p) F'{p). 

Proof. Let 


F : 1R2 X ^ ^ ~ 

( xiyi + X2yi \ 
xiyi + X2yi 
Xi2/2 

V X22/2 J 


{x,y) U{x,y) := 


Obviously, U is a bilinear map. Further, let b = 
and 62 the canonical vectors in E^, i.e. 


/ 1 \ 
1 
0 

V1 y 


ei = 


A = Id in the definition of F from Eq. (|2]l, and ei 


62 = 


Then the following holds 









a) U{ei,ei) = U{e 2 ,ei), 

b) F'(ei,ei) = 0, 

c) F'(e 2 ,ei) ^ 0. 

Elementary calculations result in 


C/(ei,ei) = L^(e 2 ,ei) 


1 

0 

V 0 / 


The definition of F from Eq. dUl gives 


Then 


F{x, y) = ^(i( 2 (r/i(xi + X2)f + {xiy2f + {x2y2f) - 2 (xi + X2)yi - X2y2)- 


F{x,ei) = ^{{xi + X 2 f-2 {xi+X 2)), F{ei,y) = ^{yl + ]^yl-2yi) 

F{e2,y) = - 2yi - 2 / 2 ). 


and 


/ 2(xi+X2-1) \ / 2(yi-l) \ 

FUx,ei) = ( 2{xi+x2-i) j ^ F^{ei,y)=i^ ^ j, F^{e2,y) 


2 (V1-1) 

3 , 
y2-i 

3 


One verifies fhaf F'(ei, ei) = 


/o\ 
0 
0 
V 0 


and F'{e 2 ,ei) = 


( 0 \ 
0 
0 

\-\l 


Eor a convenienf undersfanding, lef us briefly repeal Ihe nolalions from Ihe AES melhod, see Algorilhm|2] Eef 
/r G IN/, U {0}, A: G IN, and 




( 11 ) 


be Ihe elemenls of Ihe sequences {p )k£]N and {vk a)k£TN from Ihe AES algorilhm. Note lhal p, = p, r, = 
{Pi,---,p’l) and Vk = U{p^). 

Definition 4.3 (A{vk))- The set of accumulation points of {vk)k^M A denoted by A{vk), i-e. 


A{vk) := {v G V : V is an accumulation point of {vk)ke¥l} ■ 


( 12 ) 


We demonslrale in Theorem |4T3] lhal every accumulation poinl of {vk)k&TN is a critical poinl, i.e. A{vk) C DJI. 
This is an exislence slalemenl on Ihe parameter space P. Eemma|43]shows us a candidale for such a parameter 
system. 
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Remark 4.4. Obviously, if the sequence of parameter i^ bounded, then the set of accumulation points 

ofiPk) fcgiN is not empty. Consequently, the set A{vk) is not empty, since the tensor format U is a continuous 
map. 

Lemma 4.5. Let the sequence {Pk)k&^ frofn the ALS method be bounded and define for J C M the following 
set of accumulation points: 


L-l 

Aj := U p G P : pis an accumulation point of {p^ 

n=o 


There exists p* = ... ,pf) G Aj such that 

Up* II = min ||p||. 

P&Aj - 

Proof. Since the sequence {pAk&ft is bounded, it follows from the definition in Eq. (fTTI) that {p )keK is 

also bounded. Therefore the set |p G P : p is an accumulation point of (p^ is not empty and compact. 

Hence .4. is a compact and non-empty set. ■ 


We are now ready to establish our main assumptions on the sequence from the ALS method. 

Assumption 4.6. During the article, we say that (p^)fcg]N satisfies assumption A1 or assumption A2 if the 
following holds true: 

(Al:) The sequence (p^)fcG]N A bounded. 

(A2:) The sequence (p^)fcGM A bounded and for J C M we have 

Vp G Ml : 3A:o G J : V/c G J : k > ko => rank(Wfc,^) = rank(TE;), (13) 

where p* = (p^,... ,p2) G Aj is a accumulation point form Lemma \43\ and 

Wk,^ = fE^(p^\...pJll,p^+i,...,pi), 
w; = w,ipi,...p;_„p;^„...,pi). 

Remark 4.7. In the proof of Theorem \4.13\ assumption A2 ensures that the ALS method depends continuously 
on the parameter system p^ i.e. - - > G'l^W^bfor a convergent subsequence (p^ ^)fc6J- 

Using the notations and definitions from Section [3l we define furfher 

Ak,kL '■= yk,kL-^^k,kL1 (14) 


for /c G M and p G M^. 


For fhe ALS mefhod fhere is an explicif formula for fhe decay of fhe values befween /(ufc^^+i) and f{vk,k)- 
The relafion befween fhe funcfion values from Eq. ([T5] ) is crucial for fhe convergence analysis of fhe ALS 
mefhod. 

Lemma 4.8. Let A: G M, p G M^. We have 


f{Vk,kL+l) - f{Vk,ki) 


1 

2 



II^P 


(15) 
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Proof. From AlgorithmH we have that Vk,^+i = + Ak,^,, where := Vk,^Af^^^VjF^rk,^i. Elementary 

calculations give 


f{Vk,t,+l) 


1 


2 “h Ak^f), Vk^fj, “h Ak^f) {b, Vk^fj, Ak^f)] 

{^k, Ak^ fj,) “h 2 {-^Ak, Ak^ /f) 


fiVk,iJ.) + 


f{Vk,t,) + 




f{n,ii) + 


(^^k,k.^k,fi^k,ti''^k, k-Ak, + 2 {yk,k^k,iyk,kfk,kAk,t^ 


fi'>^k,k) - {Vk,f,A^^^V^^^rk,k,rk,t 


Corollary 4.9, (/(ufc))fceiN (Z IR is Cl d€sc€fidifig scc^ucticc cmd there exists cc G IR such that - y ot. 

k^oo 


Proof. Let A: G IN and // G IN/,. From Lemma |4^ it follows that 


L 

f{vk+l) - f{Vk) = f{vk,L) - fivkfi) = L] fi'^k,k) - fivk,k-l) 

fi=l 

1 

= -^bP Y1 y^,k^kf < 0, 

" " /i =0 


since the matrices are positive definite. This shows that {f{vk))k£¥i C R is a descending sequence. The 
sequence of function values {f{vk))ke¥i is bounded from below, since the matrix A in the definition of / is 
positive definite. Therefore, there exists an a G R such that f{vk) - > a. ■ 

k—^oo 

Lemma 4.10. Let (r’A:,^)fceiN,/ieiNi C V the sequence from Algorithm^ We have 

f{vk,k) = {^k,k,b) = -^^\\vk,k\\A (16) 

for all /c € IN,/i € Mi, where {v,w)j^ := {Av,w) and HflU := C{v, v) 


Proof Let A: € M and /r € M^. We have 

{vk,k,vk,k)A = fvk,n-i {yk,k-i^yk,k-ty'Vk,k-ib,yk,k-i {yy-i^yLk-iV'y^-ib) 

= {yk,k-i^,{yy-iAyk,n-i)~'yk,k-f = i^k,k,b). 

The rest follows from the definition of /, see Eq. 0. ■ 

Corollary 4.11. Let {vk^ij‘)kGK,kGKL C V the sequence of represented tensors from the ALS algorithm. The 
following holds: 
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(a) f(vk,/^+i) < fivk,f,), 

(b) \\vk,^,+i\\\ > \\vkj\% 

(c) {vk,i,+i,b) > {vk,i„b). 


Proof. Follows from Lemma |4^ and Lemma 14.101 

Lemma 4.12. Let the sequence {p^)k&t C P fulfil the assumption Al. Then we have 


max 

o<u<L-t 


f'M: 


k^oo 


0 . 


Proof According to Lemma [L6] and Lemma |4~^ we have 


L ^ L-1 

f{vk) - f{vk+i) = f{vk,^^-l) - fivk,f,) = Y 


1.1=1 
1 


fj.=0 


> 


> 


2||6II2 
1 

nw 

1 




kt=0 

L-l 


E ('4L (c't,.A7) r'M) 


ki=0 

L-l 


Y. '^min(^fc^J^)Amin {pk,^k^Y^^. 


/i=0 




L-l 


2A„,ax(A)||6||2 {YuFkYlu) \\KY^) 


where 14^^ = Wk^^JJk,^iDki^ is from Eq. ®. In the last estimate, we have used that the Ritz values are 
bounded by the smallest and largest eigenvalue of A, i.e Amin(^) < Amin(^fc,/i) < Amax(^fc,At) < Amax(^)- 
Since the tensor format U is continues and the sequence {Pjfik&t is bounded, it follows from the theorem of 

Gershgorin and Cauchy-Schwarz inequality that there is 7 > 0 such that Amax {p^k kiy^k,i?j < 7. recall that 

^kki^k,u ~ ^k,iiF)k^pp, see Proposition 13.31 Therefore, we have 

L-l 


f{vk) - f{Vk+l) > 


1 


2Amax(^)7l|i^lF 


E 7(!>: 


1-1=0 


> 


max 


2Amax(^)7ll&P 0<U<L-1 


KY 




> 0 . 


Further, it follows from Corollary I4.9l that 


0 = lim s/f{vk) - f{vk+i) = lim ^ max F'(p^) 


k^oo 


k^oo 0<fi<L—l 


Theorem 4.13. Let {vk)keK be the sequence of represented tensors and suppose that the sequence of parame¬ 
ter {PffikelM C P from the ALS method fulfils assumption A2. Every accumulation point of{vk)ke¥l A a critical 
point, i.e. A{vk) C 9JI. Further, we have 

dist {vk,Tl) -^ 0. 

k^oo 
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Proof. Let v € A{vk) be an accumulation point and {vk)k&j<zm C U (P) a subsequence in the range set of the 
tensor format U with Vk ->■ v. Then there exists {pk)k€JcjN C P with Vk = U(jpk) for all k ^ J. Further, 

k—foo ~ 

let G IN/, and define for all k ^ J 

Let ^* € ]Nl and )i.gj'cj with gk,u. - 1 p* argmax ^ , ||p|| € F, see Lemma|131 Without loss 

of generality, let us assume that p* = 1. This assumption makes the the notations not more complicated then 
necessary. Since {gk,k.)keJ is bounded, there exists € P and a corresponding subsequence {g^ ^)fceJ^cj 
such that 


P-k P-kfl 

- (Pi,P2i---iPl) T- ^p - (PliP2, • • • ,Pl), 

/c—>-oo “ 



U.k,l 

- • • • ,Pl) - -^ - (Pl,P2, ■ ■ ■ ,Pl), 

k^oo ~ 



i.k,kL 

- (p^^S---,p^^\pf;+i,---,PL) - (pi>-' 

' ■ 1 P/t) P/L+l ) • ' 

• • ,Pl) 

i.k,L 

- {Pi^\ ■ ■ ■ ^ P^^' - (Pi, • • • ,Pl)- 

k^oo ~ 




From Lemma l4.15l and U{p ) - > v it follows that 

k^oo 

V = lim U{g ) = U{p^^^) f.a. ^ G IN/,, (17) 

fc —>-00 —— 

where k ^ J. Furthermore, we have 

p* = = • • • = pW. 

To show this, assume that 

M := |// G IN/, / 0 

and define o := min M G IN/,. From assumption A2 if follows 

^ GtW^b = p,. 

Thus we have in particular fhaf p,^TkemelVFjx, see fhe Definition of Gf and Proposition 13.31 Since U{p*) = 

WuPv = WyPv, if follows furlher fhaf 5^ := Pu — Pu G kemelPFi/ and ||p^|p = ||Pi/|P + ||<5i/|p. 
Lemmaand Lemma l4.12l show fhaf 

WlMWk,uPl - b) - -^ 0 . 

k—foo 

From fhe definifion of o, we have fhen 

fLj(AVF,p,-6) = 0, 

nofe fhaf for v = min M 

lk,u = • • • ^Pl) = (pii • • • ,Pu-i,Pu,Pu+i, ■ ■ ■ ,Pl) 

holds. Since pp, = GfWjb and p* = argminpg_ 4 ^||p||, if follows \\pu\\ = \\Pu\\- Hence, we have pip = p,g, 
because \\pu\\‘^ = ||pi/|P + ||<5i/|P implies fhen 5y = 0. Bufpjg = p^ confradicfs fhe definition of o = minM. 
Consequenfly, we have 

p* = = • • • = pW. 
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From Eq. (fTTI) and the definition of it follows then 

V = U{p*) 


and 

0 = lim F'^ig ) = for all p € 

i.e. A{vk) Q 9Jt. Now, let 6k = inf^g^jj \\vk — wH and suppose that there exists a subsequence ((^fc)fceJC]N with 
lim/j^oo = 6 ^ 1R,+. Then {vk)keJ has a convergent subsequence. Since this subsequence must have its 
limit point in Tl, it follows that <5 = 0. Which proves dist {vk , Oft) -^ 0 by contradiction. 

/c—)-oo 


Lemma 4.14. Let {vk)ke]N FV be the sequence of represented tensors from the ALS method. It holds 


ll^fc+i - Vk\\A -0. 

/c—>-oo 


Proof Let A: € IN. We have 


^ ^ '^k,kL I’fc,/!—1 


r= 

Since - Vk^^, = it follows further 


L-l 


— I 'y y i||A I F L ^ (IS) 

A 


\\vk,k.+l ~ '^k,k.-^k,kyk,kFk,n ^ ~ 

= (yk,iM^k,ii^k,fi'^k,iM^'^k,k^ ■ 

Combining this with Eq. ([T5]) and (fT^ gives 


L-l 


\\vk+i - Vk\\A < 2 L|| 6 |P - f{vk,^l+l)) = 2 L|| 6 |p {f{vk) - f{vk+i)) ■ 

11=0 

Erom Corollary I4.9l it follows {f{vk) — f{vk)) -^ 0. Therefore, we have 

/c—>-oo 


\\vk+i - Vk\\A -^ 0. 

/c—>-oo 


Lemma 4.15. Let {vk)k£K FV be the sequence of tensors from the ALS method and v gV with limfc_).oo Vk = 
V. Further, let p and {vk^^)k€M FV as defined in Algorithm^ We have 

lim Vk,ii = V for a/Z p € IN l. 
k^oo ’ 


Proof Define Vkp '■= Vk (like in Algorifhmus |2]l and assume fhaf 














Furthermore, set fj* := minM G IN^,. 

From Lemma l4.14l it follows 

- v\\a < \\vk,^l’> - v^*-i^k\\A + \\vk,^l*-l - v\\a -0. 

k^oo 

But this contradicts the definition of /r*. ■ 

In the following, the dimension of the tensor space V = denoted by € IN, i.e. N = 

nU The statement of Lemma 14.201 delivers an explicit recursion formula for the tangent of the angle 
between iteration points and an arbitrary tensor. This result is important for the rate of convergence of the ALS 
algorithm. 

Lemma 4.16. Let ly ^ fj,, and p = (pi,... ,Pu, ■ ■ ■ ,Pii, ■ ■ ■ ,Pl) G P- There exists a multilinear 

map : Pi X ■■■ X Pu-i x P^+i x ■ ■ ■ x x P^+i x • • • x x V — L{Py, P^) such that 

, gi/, • • • ,p^-i,p;,+i,... ,pl)6 = Mf,^^{pi,... ,p,y_i,p„+i,... ...,PL, b)gu (19) 

for all gu G Pu- Moreover, we have 

• • • ,P^-1,P;,+1,... ,PL, h) = ... ,p^_i,p^+i,... ,PL, h). 

Proof. Follows form Proposition 13. 3l (vl and definition of VFJ(pi,..., g^,... ,p^_i,p^+i,... ,PL)b. ■ 

Corollary 4.17. Let p G IN^,, fe > 2, and^ = (Pi^^ j • • • >j P%P^^+i ) • • • > p\)A^ Algorithm^ There 
exists a multilinear map : Pi x • • • x P ^_2 x P^^+i x • • • x P/^ x V —)■ P(P^_i, P^) such that 


pV^ 

= giAUpP.---. 

k+l k 

Pa-2’Pa+i’■ ■ ■ 

PlA)pA-\’ 


(20) 

pV-\ 


■ ■ >Pa-2’Pa+i’ 

■ ■■PlApI, 


(21) 

i.e. p^+i 


Pa-2’Pa+i’ • • • 

,PLMat„.,M>P,-- 

rf+t k 

■ )Pa-2’Pa+i’ • ■ 

■■,PlMpI^ 


and ^ are defined 

in Algorithm |2] 





Proof Follows form Eq. (O and Lemma 14.161 ■ 


The following example shows a concrete realisation of the matrix for the tensor rank-one approximation 
problem. 

Example 4.18. The approximation ofb£Vbya rank one tensor is considered. Let Vk = Pi ® pi <8) ■ ■ ■ (8) p^ 
and 

ti td d 

b — /3(ii,...,id) bfj^^i^, 

h=l *£i=l M=l 


i.e. the tensor b is given in the Tucker decomposition. From Eq. dP]) it follows 


= 


1 


n d II zp 112 

U=2 Pall 


1 


nUUWm 

1 


■ ^ Ah,... 

■p) n ^ 

[bu,i^^Pl)Kh 



fL=2 




t2 

td-l 

d-1 


.i.E- 

Ah,-,id) 

n 

n=i id=i 

22 = 1 

*d-l=l 

fi=2 

PiTpfcPj 

Pi, 




.T 
,fc|l 


M 


p\ 


Ml(P2v>Pd_l) = 


15 
















where 1 G BJ^B^ = and the entries of the matrix 

are defined by 


t2 


td-1 


d-1 


*2=1 *d_i=l 


WpU 


(1 < ii < ti, I <id<td) ■ 


Note that Fi ^ is a diagonal matrix if the coefficient tensor fi G IR-*'' i^ super- diagonal, see the example 


in 


For p^ it follows further 


rfi - 

Pd — 


ILfcl|2 


K=\ Up: 


h td d-l 

/^(ii,...,*d) ri ^d,id = 


Mil *1=1 *;J=1 


and finally 


„fc+i _ 

^1 “ 


M=1 

1 


|Pi|| 11 m=2||Pm 


-BdTl.Bfp'l 


nu u\ 


B,T,,kTl,Bf pI 


Lemma 4.19. Let /i G IN/^, {p^ C P, and ('i'A:,M)fcpiN C V f/ie sequences from Algorithm^ Further¬ 

more, define 


Mk,^ 


iMifceM 


H, 


:= iV4(p^^•••:PM-2,PM+l>•••>PL,&), 
iV.,M := 

where we have used the notations from Algorithm |2] A micro-step of the ALS method is described by the 
following recursion formula: 

Vk,fj,+i = for all k>2, (22) 

U{p \'^^,... ,pfil},P^+Sp^+i, ■ ■ ■ ,pi) = Nk,^,U{p\-^^,... ... ,p\). 

Proof According to Corollary 14.171 Remarkand definition of we have that 

= 'WpGlphNP-i = 


Lemma 4.20. Let {Nk,k)k£M mgiNi, sequence of matrices from Lemma 14.791 i; G V \ {0}, anr/ ii G 

R^xlV-i orthogonal matrix with span(i;)-’- = range(i?), i.e. the column vectors of R form an orthonormal 
basis of the linear space span(r;)-’-. Assume further that Ck^^ ■= G R \ {0} and Sk^^ '■= R'^Vk^^ G 

R^“^ \ {0} holds true. Then we have the following recursion formula for the tangent of the angles: 


\iw.A[v,Vk,^+i]\ 


{s) 

Pk,id 

(c) 

Pk,ld 


|tanZ , 


where 




1-^ ('k,kL A R blk,k.^ ^k,kL 
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Proof. The block matrix 


V :=[v /? ] G E 


NxN 


V := V f 


is orthogonal, i.e. the columns of the matrix V build an orthonormal basis of the tensor space V. The tensor 
Vk,fj. and the matrix are represented with respect to the basis V, i.e 


and 


= V {V^Nk,^V) =[v R] 


v^Nk,f,R 


R Nk,iJ.v R Nk^f^R 

The recursion formula (l22l) leads to the recursion of the coefficient vector 


[n Rf. 




lfNk,f,v v^Nk,^,R 
R^Nk,ij,v R'^Nk^^R 


\ _ f H ^k,)! T H ^k,fiR ^k,ii 

^k,k. ) \ R^k,fP ^k,ti T R R^k,^R ^k,ii 


Since ||sfc_^|| / 0 and |cfc^^| / 0 we have 

(^RR Vk^fj,-{-i, Vk^^-\-i'^ _ \\R 


tan^ Z[r;,r;fc,;,+i] = 


I 


{v2Fvk,^+l,Vk,^^+l) {v^Vk,^,+lf ick,f,+if {Ck,^y 


(^) 


\R^Vk, 


(-5) 


if T = if tanVfD,.,,,]. 
qil ilFvk,,) \qll 


Theorem 4.21, Suppose that the sequence (p^)fcg]N C P from Algorithm ^fulfils assumption Al. If one 
accumulation point v € A{vk) A 0 is isolated, then we have 

Vk -^ V. 

k—^oo 

Furthermore, we have that either the ALS method converges after finitely many iteration steps or 

|tanZ[i;,r;fc,^+i]| < |tanZ[i;,i;fc,^]| , 


where 


9m — 


lim sup 

/c—>-oo 


(-5) 


(c) 

9fc,M 


Proof Let e > 0 such that v is the only accumulation point in 17 := {u G V : ||u — uHa < e}. Assuming that 
the sequence (ufc)fc6]N C V from the ALS algorithm does not converge to v and let X C IN be a subset with 

||7 - Vk\\A < e 

for all k £ I. Since v is the only accumulation in U and {vk)k&¥i does not converge to v the following set Ik 
is for all fc G X well-defined and finite: 

Rk ■= ^k' G IN : llu — u^lU < e for all fc < 7 < A;'| . 
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The definition of the map fe' : X —)■ IN, A; i-)- k'{k) := maxX^ implies that 

l|f^ - Vk'(k)\\A < e and \\v - Vk'(k)+i\\A > £ 

for all k € I. Since v is the only accumulation point of {vk)k&'m in U it follows that the subsequence 
{vk'{k))kex converges to v. Therefore, we have 

e 

11^ ^ ~ 

and 

\\vk'(k)+i — Vk'(k)\\A > ||n - Vk'{k)+i\\A - ||n - Vk'(k)\\A > - 

for sufficient large k £ Z. But this contradicts the statement ||ufc+i — Vk\\A -^ 0 from Lemma 14.141 The 

fc—>-CO 

inequality for the rate of convergence of an ALS micro-step |tan Z[i;, |tanZ[i;,Ufc_^]| follows 

direct from Lemma l4.20l and the definition of q^. Note that in Lemma 14.201 Ck^^ ^ 0 since lim^^co Vk^^ = v. 

If ■Sfco,At = 0 for some ko £ IN, then the ALS method converges after finitely many iteration steps. ■ 

Corollary 4.22. Suppose that the sequence {PfJkeK C P fulfils A2 and assume that the set of critical points 
9JI is discrete^ then the sequence of represented tensors {vk)ke¥i from the ALS method is convergent. 

Proof Follows directly from Theorem 14.2II and Theorem l4.13l ■ 

Remark 4.23. 

• The convergence rate for an entire ALS iteration step is given by q := nU q^_i, since 

L 

|tanZ[i;,Ufc+i]| = |tan Z[i;, ua;,l] | < qi-i |tanZ[i;,r;fc,L_i]| < q^-i |tanZ[i;,Ufc,o]| = Q |tanZ[i;,UA;]| . 

M=i 

• Without further assumptions on the tensor bfrom Eq. (|7]), one cannot say more about the rate of con¬ 

vergence. But the ALS method can converge sublinearly, Q-linearly, and even Q-superlinearly. We refer 
the reader to l[28\l for a detailed description of convergence speed. 

- If q = 0, then the sequence (|tanZ[i;,Ufc]|);.g]fj converges Q-superlinearly. 

- If q < 1, then the sequence (|tan Z[i;, Ufc ]converges at least Q-linearly. 

- If q = 1, then the sequence (|tanZ[i;,Ufc]|)^g][^ converges sublinearly. 

A specific tensor format U has practically no impact on the different convergence rates. Since we can 
find explicit examples for all cases already for rank-one tensors. Please note that the representation 
of rank-one tensors is included in all tensor formats of practical interest. In the following, we give a 
brief overview of our results about the convergence rates for the tensor rank-one approximation, please 
see / Ii3l/ for proofs and detailed description. The multilinear map that describes the representation of 
rank-one tensors is given by 


t/ : X Iff" 

fi=i 




M=i 


(pi,...,Prf) U{pi,...,pd) 


d 


®Pu- 

pl=l 


^In topology, a set which is made up only of isolated points is called discrete. 
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A tensor b is called totally orthogonal decomposable if there exist r G M with 

r d 

j=i /i=i 

such that for all p G and ji,j 2 G IN^ the following holds: 

The set of all totally orthogonal decomposable tensors is denoted by 

TO = {b ^ V : b is totally orthogonal decomposable} C V. 

It is shown in / Ii3l/ that the tensor rank-one approximation of every b ^ TO by means of the ALS method 
converges Q-superlinearly, i.e. q = 0. 


For examples of Q-linear and sublinear convergence, we will consider the tensor bx given by 

3 

bx = p + X{p(^q'Siq + q'^p'Siq + q^q^p) 

11=1 


for some A G R>o and p,q ^ R"' with ||p|| = \\q\\ = 1, {p, q) = 0. If X < ^, it is shown in /fT?]/ that 
V = p is the unique best approximation ofbx- Furthermore, for the rate of convergence we have 

the following two cases: 

a) For A = ^ it holds g = 1, i.e. the sequence (|tanZ[v,t;fc]|);i,g]f^ converges sublinearly. 

b) For A < ^ the ALS method converges Q-linearly with the convergence rate 


Q\ 


— ^3A + A^ + x/ (3A + A^)^ + 4A^ 


This example is not restricted to d = 3. The extension to higher dimensions is straightforward, see 4731/ 
for details. 


5 Numerical Experiments 

In this subsection, we observe the convergence behavior of the ALS method by using data from interesting 
examples and more importantly from real applications. In all cases, we focus particularly on the convergence 
rate. 


5.1 Example 1 


We consider an example introduced by Mohlenkamp in ll25l Section 4.3.5]. Here we have A = id and 


b 




—V* 

bi:= 


1 

0 
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see Eq. O- The tensor b is orthogonally decomposable. Although the example is rather simple, it is of great 
theoretical interest. It follows from Theorem l4.21l and |[T3l that the rate of convergence for an ALS micro- step 
is 


Qfj, = lim sup 

k^oo 


(- 5 ) 


(c) 


= 0 . 


Here the ALS method converges Q-superlinearly. Let r > 0, our initial guess is defined by 



we have for r < ^ that the initial guess uo('7') dominates at 62 - Therefore, the ALS iteration converge to 62 . 
see |[T3l for details. In the our numerical test, the tangents of the angle between the current iteration point and 
the corresponding parameter of the dominate term bi (1 < I < 2) is plotted in Ligure lSTTl i.e. 


where cos 


Ibtll 




1 - cos^ ipk,l 

COs2 ipk^i 


(23) 


5.2 Example 2 

Most algorithms in ah initio electronic structure theory compute quantities in terms of one- and two-electron 
integrals. In [T] we considered the low-rank approximation of the two-electron integrals. In order to illustrate 
the convergence of the ALS method on an example of practical interest, we use the two-electron integrals of the 
so called AO basis for the CH 4 molecule. We refer the reader to [T] for a detailed description of our example. 
The ALS method converges here Q-linearly, see Ligure[3] 


5.3 Example 3 


We consider the tensor 


3 

b\ = p + X{p^q®q + q^p‘S>q + q^q®p) 

^l=l 


from Remark 14.23 1 The vectors p and q are arbitrarily generated orthogonal vectors with norm 1. The values 
of tan((/?i^fc) are plotted, where is the angle between p^ and the limit point p. Lor the case A = 0.5 the 
convergence is sublinearly, whereas for A < 0.5 it is Q-linearly. According to Theorem l4.21l and ifT^ . the rate 
of convergence for an ALS micro-step is given by 


qx = lim sup 

/c—>-oo 




^ (sx -f A^ -f y/(3A -f A2)2 -y 4 a) . 


Lor A = 0.46, we have for the convergence rate go .46 = 0.847. In Li sure 1531 the ratio is plotted. 

The ratio perfectly matches to qo .46 = 0.847. This plot shows on an example the precise analytical 

description of the convergence rate. 
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tan(phi_k,2) 



k 

Figure 2: The tangents tan ipk ^2 from Eq. (|2^ is plotted for r G {0.4, 0.495, 0.4999}. 
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(a) The tangents tanipfc^i for A € {0.1, 0.2, 0.3}. 



(b) The tangents t&nipk,i for A € {0.44, 0.46, 0.48}. 

Figure 4: The approximation of b\ from Remark 14.231 is considered. The tangents of the angle between the 
current iteration point and the limit point with respect to the iteration number is plotted. For A < 0.5 the 
sequence converges Q-linearly with a convergence rate q\ = ^ ^3A + A^ + -^/(sX^KX^p'^rdA^ < 1. 
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Figure 5: The approximation of bx from Remark I4.23I is considered. The tangents of the angle between the 
current iteration point and the limit point with respect to the iteration number is plotted. For A = 0.5, we have 
sublinear convergence since go.s = 1- 
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Figure 6: The ratio ^ plotted for A = 0.46. The rate of convergence from Theorem 14.2II is for this 

example equal to 0.847. The plot illustrates that the description of the convergence rate is accurate and sharp. 
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